Classification of Verb Particle Constructions with the Google Web1T Corpus

نویسندگان

Jonathan K. Kummerfeld

James R. Curran

چکیده

Manually maintaining comprehensive databases of multi-word expressions, for example Verb-Particle Constructions (VPCs), is infeasible. We describe a new type level classifier for potential VPCs, which uses information in the Google Web1T corpus to perform a simple linguistic constituency test. Specifically, we consider the fronting test, comparing the frequencies of the two possible orderings of the given verb and particle. Using only a small set of queries for each verb-particle pair, the system was able to achieve an F-score of 75.7% in our evaluation while processing thousands of queries a second.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Techniques for Automatically Inferring the Semantics of Verb-Particle Constructions

This paper describes an investigation of some potential features for a statistical approach to inferring the semantics of verb-particle constructions from corpus data. Verb-particles cause particular problems for the computational semantic analysis of language, because their meaning often cannot be derived through the usual compositional methods of analysis. Two novel techniques are presented w...

متن کامل

Automatic Identification Of English Verb Particle Constructions Using Linguistic Features

This paper presents a method for identifying token instances of verb particle constructions (VPCs) automatically, based on the output of the RASP parser. The proposed method pools together instances of VPCs and verb-PPs from the parser output and uses the sentential context of each such instance to differentiate VPCs from verb-PPs. We show our technique to perform at an F-score of 97.4% at iden...

متن کامل

A Statistical Approach To The Semantics Of Verb-Particles

This paper describes a distributional approach to the semantics of verb-particle constructions (e.g. put up, make off ). We report first on a framework for implementing and evaluating such models. We then go on to report on the implementation of some techniques for using statistical models acquired from corpus data to infer the meaning of verb-particle constructions.

متن کامل

Verb-Particle Constructions in the World Wide Web

In this paper we investigate the phenomenon of verb-particle constructions, discussing their characteristics and their availability for use with NLP systems. Combinations automatically extracted from corpora greatly improve the coverage of available resources. However, the data sparseness problem is particularly acute for these constructions and even using a corpus as large as the British Natio...

متن کامل

USYD: WSD and Lexical Substitution using the Web1T corpus

This paper describes the University of Sydney’s WSD and Lexical Substitution systems for SemEval-2007. These systems are principally based on evaluating the substitutability of potential synonyms in the context of the target word. Substitutability is measured using Pointwise Mutual Information as obtained from the Web1T corpus. The WSD systems are supervised, while the Lexical Substitution syst...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Classification of Verb Particle Constructions with the Google Web1T Corpus

نویسندگان

چکیده

منابع مشابه

Statistical Techniques for Automatically Inferring the Semantics of Verb-Particle Constructions

Automatic Identification Of English Verb Particle Constructions Using Linguistic Features

A Statistical Approach To The Semantics Of Verb-Particles

Verb-Particle Constructions in the World Wide Web

USYD: WSD and Lexical Substitution using the Web1T corpus

عنوان ژورنال:

اشتراک گذاری